perf(profiling): try to intern strings/functions #15062

KowalskiThomas · 2025-10-28T11:33:04Z

Description

This PR updates Python Profiling / Stack V2 to use string interning features added by @morrisonlevi in libdatadog (DataDog/libdatadog@main...levi/phase1). This branch adds a concept of ProfilesDictionary in which profilers can intern strings and functions.

The general ideas are the following:

To avoid, as much as possible, having to cross FFI boundaries, we keep track of strings that we have already interned. Since Echion has its own interning (StringTable), we rely what Echion gives us when it refers to strings (StringTable::Key, which really is just a void*) and keep a map of EchionStringId → LibDatadogStringId).
- When we see a Echion String ID, we check whether our map contains this String ID; if not we fetch the actual std::string_view from Echion's StringTable and intern it into libdatadog. Once we have a libdatadog String ID, we can use it in our sampling logic.
To keep things a clean, I had to create an interface in ddup that abstracts away libdatadog concepts – I could not leak the ddog_prof_StringId2 type to ddup users.
- The result are the new string_id_t and function_id_t types (which in practice are just both void*, which also is what ddog_prof_StringId2 really is)
- We also have string_id_t ddup_intern_string(std::string_view s); (and similarly for functions) that Stack V2 can call ot intern things
Another thing we intern is label keys (for push_label methods).
- Note Levi's new APIs do not support interning label values and units. The reason is whereas the number of label keys is assumed to be bounded (e.g. "Thread Name", "Task Name", etc.), the number of label values could be infinite (think actual Task names, Thread names, etc.). This in practice would be a memory leak.
- Interning of label keys is handled automatically by ddup and Stack V2 does not know about it (Stack V2 only deals in ExportLabelKey's, which is an enum)

One of the most challenging parts of this PR is getting threading and forking right:

We need to be thread-safe
- ProfilesDictionary itself is thread safe...
- ... but its initialisation in ddup isn't
- We want to avoid depending on call_once which has a significant runtime cost in get_profiles_dictionary, so we need to carefully initialise things in the right order
We need to be fork-safe
- What we call ProfilesDictionary is actually a pointer to (an ARC to) a Profiles Dictionary Rust object.
- Upon forking, the child process needs to reset its interning state.
  - The reason is that cached libdatadog String IDs (which really are just pointers) still refer to pointers that exist in the parent process' memory, not the child's.
  - ⇒ There's no way around it, we need to forget them all and start caching/interning from scratch again.
  - However, the parent's heap is cloned to the child process, meaning we still have to decrease the refcount of the inherited Profiles Dictionary before initialising a new one in order to avoid a memory leak
- Additionally, anything that relates/refers to data inside the Profiles Dictionary needs to be reset as well
  - Interned strings in Stack V2, i.e. the cache from Echion String ID to libdatadog String ID
  - String IDs for interned label key strings
We need to clean things up properly.
- When we exit, we need to decrease the refcount for the Profiles Dictionary so that it is properly freed. This is done using an std::atexit hook.

More performance figures are available in this Notebook.

Testing

Risks

Additional Notes

github-actions · 2025-10-28T11:33:43Z

CODEOWNERS have been resolved as:

ddtrace/internal/datadog/profiling/dd_wrapper/include/ddup_interface.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/libdatadog_helpers.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/profile.hpp       @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/sample.hpp        @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/sample_manager.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/ddup_interface.cpp    @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp           @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/sample.cpp            @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/sample_manager.cpp    @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/test/test_forking.cpp     @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/include/sampler.hpp         @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/include/stack_renderer.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/src/sampler.cpp             @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack_v2/src/stack_renderer.cpp      @DataDog/profiling-python
src/native/.cargo/config.toml                                           @DataDog/apm-core-python
src/native/Cargo.lock                                                   @DataDog/apm-core-python
src/native/Cargo.toml                                                   @DataDog/apm-core-python

github-actions · 2025-10-28T11:59:07Z

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 236 ± 2 ms.

The average import time from base is: 239 ± 2 ms.

The import time difference between this PR and base is: -2.53 ± 0.1 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.048 ms (0.87%)

ddtrace.bootstrap.sitecustomize 1.332 ms (0.56%)

ddtrace.bootstrap.preload 1.332 ms (0.56%)

ddtrace.internal.remoteconfig.client 0.628 ms (0.27%)

ddtrace 0.716 ms (0.30%)

ddtrace.internal._unpatched 0.064 ms (0.03%)

subprocess 0.034 ms (0.01%)

contextlib 0.034 ms (0.01%)

json 0.030 ms (0.01%)

json.decoder 0.030 ms (0.01%)

re 0.030 ms (0.01%)

enum 0.030 ms (0.01%)

types 0.030 ms (0.01%)

ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp

KowalskiThomas · 2025-10-30T09:29:54Z

Based on the failing jobs, it seems like the current version of my PR makes the profiler not behave well in all forking scenarios

========================== Datadog Auto Test Retries ===========================
____________________________________ FAILED ____________________________________
FAILED tests/profiling_v2/test_main.py::test_fork
FAILED tests/profiling_v2/test_gunicorn.py::test_gunicorn

I'll try to understand why...

pr-commenter · 2025-10-30T09:46:27Z

Performance SLOs

Comparing candidate kowalski/perf-profiling-intern-strings-into-libdatadog (ef96a13) with baseline main (c529066)

📈 Performance Regressions (1 suite)

📈 iastaspectsospath - 24/24

✅ ospathbasename_aspect

Time: ✅ 4.243µs (SLO: <10.000µs 📉 -57.6%) vs baseline: -0.6%

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.5%

✅ ospathbasename_noaspect

Time: ✅ 1.094µs (SLO: <10.000µs 📉 -89.1%) vs baseline: +1.1%

Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.8%

✅ ospathjoin_aspect

Time: ✅ 6.154µs (SLO: <10.000µs 📉 -38.5%) vs baseline: ~same

Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.4%

✅ ospathjoin_noaspect

Time: ✅ 2.297µs (SLO: <10.000µs 📉 -77.0%) vs baseline: ~same

Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +4.6%

✅ ospathnormcase_aspect

Time: ✅ 3.863µs (SLO: <10.000µs 📉 -61.4%) vs baseline: 📈 +10.2%

Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.5%

✅ ospathnormcase_noaspect

Time: ✅ 0.577µs (SLO: <10.000µs 📉 -94.2%) vs baseline: +0.5%

Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +5.0%

✅ ospathsplit_aspect

Time: ✅ 4.770µs (SLO: <10.000µs 📉 -52.3%) vs baseline: -0.4%

Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.2%

✅ ospathsplit_noaspect

Time: ✅ 1.595µs (SLO: <10.000µs 📉 -84.1%) vs baseline: +0.9%

Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.7%

✅ ospathsplitdrive_aspect

Time: ✅ 3.623µs (SLO: <10.000µs 📉 -63.8%) vs baseline: -1.7%

Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.3%

✅ ospathsplitdrive_noaspect

Time: ✅ 0.697µs (SLO: <10.000µs 📉 -93.0%) vs baseline: -1.4%

Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.6%

✅ ospathsplitext_aspect

Time: ✅ 4.517µs (SLO: <10.000µs 📉 -54.8%) vs baseline: -1.2%

Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.5%

✅ ospathsplitext_noaspect

Time: ✅ 1.385µs (SLO: <10.000µs 📉 -86.2%) vs baseline: +0.3%

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.9%

🟡 Near SLO Breach (4 suites)

🟡 djangosimple - 30/30

✅ appsec

Time: ✅ 20.506ms (SLO: <22.300ms -8.0%) vs baseline: ~same

Memory: ✅ 66.139MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.2%

✅ exception-replay-enabled

Time: ✅ 1.351ms (SLO: <1.450ms -6.8%) vs baseline: +0.7%

Memory: ✅ 64.346MB (SLO: <67.000MB -4.0%) vs baseline: +4.9%

✅ iast

Time: ✅ 20.429ms (SLO: <22.250ms -8.2%) vs baseline: +0.2%

Memory: ✅ 66.159MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0%

✅ profiler

Time: ✅ 15.548ms (SLO: <16.550ms -6.1%) vs baseline: ~same

Memory: ✅ 54.004MB (SLO: <54.500MB 🟡 -0.9%) vs baseline: +4.9%

✅ resource-renaming

Time: ✅ 20.514ms (SLO: <21.750ms -5.7%) vs baseline: ~same

Memory: ✅ 66.168MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0%

✅ span-code-origin

Time: ✅ 25.345ms (SLO: <28.200ms 📉 -10.1%) vs baseline: ~same

Memory: ✅ 67.227MB (SLO: <69.500MB -3.3%) vs baseline: +4.3%

✅ tracer

Time: ✅ 20.456ms (SLO: <21.750ms -6.0%) vs baseline: ~same

Memory: ✅ 66.153MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0%

✅ tracer-and-profiler

Time: ✅ 22.707ms (SLO: <23.500ms -3.4%) vs baseline: -0.2%

Memory: ✅ 67.849MB (SLO: <68.000MB 🟡 -0.2%) vs baseline: +4.9%

✅ tracer-dont-create-db-spans

Time: ✅ 19.291ms (SLO: <21.500ms 📉 -10.3%) vs baseline: -0.4%

Memory: ✅ 66.198MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0%

✅ tracer-minimal

Time: ✅ 16.679ms (SLO: <17.500ms -4.7%) vs baseline: +0.3%

Memory: ✅ 65.805MB (SLO: <67.000MB 🟡 -1.8%) vs baseline: +4.5%

✅ tracer-native

Time: ✅ 20.497ms (SLO: <21.750ms -5.8%) vs baseline: +0.3%

Memory: ✅ 67.869MB (SLO: <72.500MB -6.4%) vs baseline: +5.1%

✅ tracer-no-caches

Time: ✅ 18.460ms (SLO: <19.650ms -6.1%) vs baseline: +0.3%

Memory: ✅ 66.208MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.2%

✅ tracer-no-databases

Time: ✅ 18.781ms (SLO: <20.100ms -6.6%) vs baseline: ~same

Memory: ✅ 65.824MB (SLO: <67.000MB 🟡 -1.8%) vs baseline: +4.6%

✅ tracer-no-middleware

Time: ✅ 20.183ms (SLO: <21.500ms -6.1%) vs baseline: +0.2%

Memory: ✅ 66.198MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.2%

✅ tracer-no-templates

Time: ✅ 20.334ms (SLO: <22.000ms -7.6%) vs baseline: ~same

Memory: ✅ 66.218MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.1%

🟡 errortrackingdjangosimple - 6/6

✅ errortracking-enabled-all

Time: ✅ 18.020ms (SLO: <19.850ms -9.2%) vs baseline: -0.2%

Memory: ✅ 66.070MB (SLO: <66.500MB 🟡 -0.6%) vs baseline: +4.6%

✅ errortracking-enabled-user

Time: ✅ 18.307ms (SLO: <19.400ms -5.6%) vs baseline: +1.4%

Memory: ✅ 66.092MB (SLO: <66.500MB 🟡 -0.6%) vs baseline: +4.6%

✅ tracer-enabled

Time: ✅ 18.109ms (SLO: <19.450ms -6.9%) vs baseline: +0.3%

Memory: ✅ 65.888MB (SLO: <66.500MB 🟡 -0.9%) vs baseline: +5.0%

🟡 errortrackingflasksqli - 6/6

✅ errortracking-enabled-all

Time: ✅ 2.073ms (SLO: <2.300ms -9.9%) vs baseline: ~same

Memory: ✅ 52.553MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.7%

✅ errortracking-enabled-user

Time: ✅ 2.097ms (SLO: <2.250ms -6.8%) vs baseline: +1.3%

Memory: ✅ 52.652MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +4.8%

✅ tracer-enabled

Time: ✅ 2.100ms (SLO: <2.300ms -8.7%) vs baseline: +1.6%

Memory: ✅ 52.514MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.6%

🟡 flasksimple - 18/18

✅ appsec-get

Time: ✅ 4.606ms (SLO: <4.750ms -3.0%) vs baseline: ~same

Memory: ✅ 62.367MB (SLO: <65.000MB -4.1%) vs baseline: +5.0%

✅ appsec-post

Time: ✅ 6.624ms (SLO: <6.750ms 🟡 -1.9%) vs baseline: ~same

Memory: ✅ 62.370MB (SLO: <65.000MB -4.0%) vs baseline: +4.9%

✅ appsec-telemetry

Time: ✅ 4.611ms (SLO: <4.750ms -2.9%) vs baseline: +0.7%

Memory: ✅ 62.378MB (SLO: <65.000MB -4.0%) vs baseline: +5.2%

✅ debugger

Time: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: -0.2%

Memory: ✅ 45.319MB (SLO: <47.000MB -3.6%) vs baseline: +5.2%

✅ iast-get

Time: ✅ 1.861ms (SLO: <2.000ms -6.9%) vs baseline: -0.3%

Memory: ✅ 42.190MB (SLO: <49.000MB 📉 -13.9%) vs baseline: +4.8%

✅ profiler

Time: ✅ 1.914ms (SLO: <2.100ms -8.8%) vs baseline: ~same

Memory: ✅ 46.693MB (SLO: <47.000MB 🟡 -0.7%) vs baseline: +5.0%

✅ resource-renaming

Time: ✅ 3.371ms (SLO: <3.650ms -7.6%) vs baseline: +0.3%

Memory: ✅ 52.540MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.7%

✅ tracer

Time: ✅ 3.360ms (SLO: <3.650ms -7.9%) vs baseline: +0.5%

Memory: ✅ 52.531MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.7%

✅ tracer-native

Time: ✅ 3.363ms (SLO: <3.650ms -7.9%) vs baseline: +0.5%

Memory: ✅ 54.215MB (SLO: <60.000MB -9.6%) vs baseline: +5.1%

⚠️ Unstable Tests (1 suite)

⚠️

coreapiscenario - 10/10 (1 unstable)

⚠️ context_with_data_listeners

Time: ⚠️ 13.312µs (SLO: <20.000µs 📉 -33.4%) vs baseline: +0.6%

Memory: ✅ 31.595MB (SLO: <33.500MB -5.7%) vs baseline: +4.6%

✅ context_with_data_no_listeners

Time: ✅ 3.271µs (SLO: <10.000µs 📉 -67.3%) vs baseline: ~same

Memory: ✅ 31.516MB (SLO: <33.500MB -5.9%) vs baseline: +4.7%

✅ get_item_exists

Time: ✅ 0.583µs (SLO: <10.000µs 📉 -94.2%) vs baseline: +0.6%

Memory: ✅ 31.556MB (SLO: <33.500MB -5.8%) vs baseline: +4.9%

✅ get_item_missing

Time: ✅ 0.637µs (SLO: <10.000µs 📉 -93.6%) vs baseline: -0.1%

Memory: ✅ 31.556MB (SLO: <33.500MB -5.8%) vs baseline: +4.9%

✅ set_item

Time: ✅ 24.224µs (SLO: <30.000µs 📉 -19.3%) vs baseline: -0.5%

Memory: ✅ 31.634MB (SLO: <33.500MB -5.6%) vs baseline: +4.9%

✅ All Tests Passing (11 suites)

✅ httppropagationinject - 16/16

✅ ids_only

Time: ✅ 21.838µs (SLO: <30.000µs 📉 -27.2%) vs baseline: +4.1%

Memory: ✅ 32.106MB (SLO: <33.500MB -4.2%) vs baseline: +5.2%

✅ with_all

Time: ✅ 29.437µs (SLO: <40.000µs 📉 -26.4%) vs baseline: +3.3%

Memory: ✅ 32.047MB (SLO: <33.500MB -4.3%) vs baseline: +4.9%

✅ with_dd_origin

Time: ✅ 24.755µs (SLO: <30.000µs 📉 -17.5%) vs baseline: -0.2%

Memory: ✅ 32.126MB (SLO: <33.500MB -4.1%) vs baseline: +5.1%

✅ with_priority_and_origin

Time: ✅ 25.332µs (SLO: <40.000µs 📉 -36.7%) vs baseline: +4.4%

Memory: ✅ 32.047MB (SLO: <33.500MB -4.3%) vs baseline: +5.0%

✅ with_sampling_priority

Time: ✅ 21.948µs (SLO: <30.000µs 📉 -26.8%) vs baseline: +4.4%

Memory: ✅ 32.106MB (SLO: <33.500MB -4.2%) vs baseline: +4.9%

✅ with_tags

Time: ✅ 26.599µs (SLO: <40.000µs 📉 -33.5%) vs baseline: -0.4%

Memory: ✅ 32.106MB (SLO: <33.500MB -4.2%) vs baseline: +5.4%

✅ with_tags_invalid

Time: ✅ 28.073µs (SLO: <40.000µs 📉 -29.8%) vs baseline: +0.3%

Memory: ✅ 32.126MB (SLO: <33.500MB -4.1%) vs baseline: +5.1%

✅ with_tags_max_size

Time: ✅ 28.232µs (SLO: <40.000µs 📉 -29.4%) vs baseline: +4.3%

Memory: ✅ 32.047MB (SLO: <33.500MB -4.3%) vs baseline: +4.9%

✅ iast_aspects - 40/40

✅ re_expand_aspect

Time: ✅ 32.125µs (SLO: <40.000µs 📉 -19.7%) vs baseline: +0.9%

Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.1%

✅ re_expand_noaspect

Time: ✅ 29.751µs (SLO: <40.000µs 📉 -25.6%) vs baseline: +4.4%

Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +5.0%

✅ re_findall_aspect

Time: ✅ 2.911µs (SLO: <10.000µs 📉 -70.9%) vs baseline: ~same

Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.2%

✅ re_findall_noaspect

Time: ✅ 1.421µs (SLO: <10.000µs 📉 -85.8%) vs baseline: -0.1%

Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +5.0%

✅ re_finditer_aspect

Time: ✅ 4.421µs (SLO: <10.000µs 📉 -55.8%) vs baseline: +1.0%

Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +5.0%

✅ re_finditer_noaspect

Time: ✅ 1.388µs (SLO: <10.000µs 📉 -86.1%) vs baseline: -1.4%

Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.2%

✅ re_fullmatch_aspect

Time: ✅ 2.657µs (SLO: <10.000µs 📉 -73.4%) vs baseline: -5.3%

Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +4.9%

✅ re_fullmatch_noaspect

Time: ✅ 1.279µs (SLO: <10.000µs 📉 -87.2%) vs baseline: -1.8%

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +5.1%

✅ re_group_aspect

Time: ✅ 2.941µs (SLO: <10.000µs 📉 -70.6%) vs baseline: +0.8%

Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0%

✅ re_group_noaspect

Time: ✅ 1.604µs (SLO: <10.000µs 📉 -84.0%) vs baseline: -0.7%

Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +4.8%

✅ re_groups_aspect

Time: ✅ 3.090µs (SLO: <10.000µs 📉 -69.1%) vs baseline: +0.3%

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +5.5%

✅ re_groups_noaspect

Time: ✅ 1.701µs (SLO: <10.000µs 📉 -83.0%) vs baseline: +0.3%

Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.5%

✅ re_match_aspect

Time: ✅ 2.683µs (SLO: <10.000µs 📉 -73.2%) vs baseline: -1.1%

Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +4.9%

✅ re_match_noaspect

Time: ✅ 1.305µs (SLO: <10.000µs 📉 -86.9%) vs baseline: ~same

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +5.0%

✅ re_search_aspect

Time: ✅ 2.562µs (SLO: <10.000µs 📉 -74.4%) vs baseline: +3.4%

Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +5.1%

✅ re_search_noaspect

Time: ✅ 1.202µs (SLO: <10.000µs 📉 -88.0%) vs baseline: -0.5%

Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +4.8%

✅ re_sub_aspect

Time: ✅ 3.365µs (SLO: <10.000µs 📉 -66.3%) vs baseline: ~same

Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +5.0%

✅ re_sub_noaspect

Time: ✅ 1.524µs (SLO: <10.000µs 📉 -84.8%) vs baseline: -0.3%

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.8%

✅ re_subn_aspect

Time: ✅ 3.675µs (SLO: <10.000µs 📉 -63.2%) vs baseline: +2.3%

Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.8%

✅ re_subn_noaspect

Time: ✅ 1.605µs (SLO: <10.000µs 📉 -83.9%) vs baseline: -0.6%

Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +5.1%

✅ iastaspectssplit - 12/12

✅ rsplit_aspect

Time: ✅ 1.443µs (SLO: <10.000µs 📉 -85.6%) vs baseline: +2.0%

Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.2%

✅ rsplit_noaspect

Time: ✅ 0.591µs (SLO: <10.000µs 📉 -94.1%) vs baseline: +1.3%

Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9%

✅ split_aspect

Time: ✅ 1.420µs (SLO: <10.000µs 📉 -85.8%) vs baseline: +1.5%

Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +5.1%

✅ split_noaspect

Time: ✅ 0.572µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.5%

Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +4.6%

✅ splitlines_aspect

Time: ✅ 1.411µs (SLO: <10.000µs 📉 -85.9%) vs baseline: ~same

Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.4%

✅ splitlines_noaspect

Time: ✅ 0.588µs (SLO: <10.000µs 📉 -94.1%) vs baseline: +0.2%

Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.5%

✅ iastpropagation - 2/2

✅ no-propagation

Time: ✅ 48.649µs (SLO: <60.000µs 📉 -18.9%) vs baseline: -0.6%

Memory: ✅ 37.493MB (SLO: <39.000MB -3.9%) vs baseline: +4.5%

✅ otelsdkspan - 24/24

✅ add-event

Time: ✅ 40.938ms (SLO: <42.000ms -2.5%) vs baseline: +1.2%

Memory: ✅ 34.406MB (SLO: <39.000MB 📉 -11.8%) vs baseline: +4.9%

✅ add-link

Time: ✅ 36.895ms (SLO: <38.550ms -4.3%) vs baseline: +1.5%

Memory: ✅ 34.387MB (SLO: <39.000MB 📉 -11.8%) vs baseline: +4.9%

✅ add-metrics

Time: ✅ 218.323ms (SLO: <232.000ms -5.9%) vs baseline: -0.8%

Memory: ✅ 34.387MB (SLO: <39.000MB 📉 -11.8%) vs baseline: +5.2%

✅ add-tags

Time: ✅ 210.412ms (SLO: <221.600ms -5.0%) vs baseline: -0.2%

Memory: ✅ 34.446MB (SLO: <39.000MB 📉 -11.7%) vs baseline: +5.2%

✅ get-context

Time: ✅ 29.057ms (SLO: <31.300ms -7.2%) vs baseline: +0.3%

Memory: ✅ 34.406MB (SLO: <39.000MB 📉 -11.8%) vs baseline: +5.0%

✅ is-recording

Time: ✅ 28.967ms (SLO: <31.000ms -6.6%) vs baseline: -1.2%

Memory: ✅ 34.387MB (SLO: <39.000MB 📉 -11.8%) vs baseline: +5.0%

✅ record-exception

Time: ✅ 63.451ms (SLO: <65.850ms -3.6%) vs baseline: +0.2%

Memory: ✅ 34.347MB (SLO: <39.000MB 📉 -11.9%) vs baseline: +4.6%

✅ set-status

Time: ✅ 31.821ms (SLO: <34.150ms -6.8%) vs baseline: +0.2%

Memory: ✅ 34.387MB (SLO: <39.000MB 📉 -11.8%) vs baseline: +5.1%

✅ start

Time: ✅ 29.105ms (SLO: <30.150ms -3.5%) vs baseline: +1.1%

Memory: ✅ 34.485MB (SLO: <39.000MB 📉 -11.6%) vs baseline: +5.1%

✅ start-finish

Time: ✅ 33.539ms (SLO: <35.350ms -5.1%) vs baseline: -1.2%

Memory: ✅ 34.288MB (SLO: <39.000MB 📉 -12.1%) vs baseline: +4.7%

✅ start-finish-telemetry

Time: ✅ 33.989ms (SLO: <35.450ms -4.1%) vs baseline: +0.5%

Memory: ✅ 34.347MB (SLO: <39.000MB 📉 -11.9%) vs baseline: +4.7%

✅ update-name

Time: ✅ 31.106ms (SLO: <33.400ms -6.9%) vs baseline: +0.3%

Memory: ✅ 34.485MB (SLO: <39.000MB 📉 -11.6%) vs baseline: +5.3%

✅ otelspan - 22/22

✅ add-event

Time: ✅ 40.905ms (SLO: <47.150ms 📉 -13.2%) vs baseline: +1.9%

Memory: ✅ 43.623MB (SLO: <47.000MB -7.2%) vs baseline: +4.9%

✅ add-metrics

Time: ✅ 316.510ms (SLO: <344.800ms -8.2%) vs baseline: -0.8%

Memory: ✅ 652.341MB (SLO: <675.000MB -3.4%) vs baseline: +4.9%

✅ add-tags

Time: ✅ 287.014ms (SLO: <314.000ms -8.6%) vs baseline: ~same

Memory: ✅ 654.266MB (SLO: <675.000MB -3.1%) vs baseline: +4.7%

✅ get-context

Time: ✅ 80.638ms (SLO: <92.350ms 📉 -12.7%) vs baseline: +0.4%

Memory: ✅ 39.379MB (SLO: <46.500MB 📉 -15.3%) vs baseline: +4.6%

✅ is-recording

Time: ✅ 38.014ms (SLO: <44.500ms 📉 -14.6%) vs baseline: -0.2%

Memory: ✅ 42.870MB (SLO: <47.500MB -9.7%) vs baseline: +4.3%

✅ record-exception

Time: ✅ 58.109ms (SLO: <67.650ms 📉 -14.1%) vs baseline: ~same

Memory: ✅ 39.779MB (SLO: <47.000MB 📉 -15.4%) vs baseline: +4.9%

✅ set-status

Time: ✅ 43.864ms (SLO: <50.400ms 📉 -13.0%) vs baseline: -0.2%

Memory: ✅ 43.047MB (SLO: <47.000MB -8.4%) vs baseline: +4.7%

✅ start

Time: ✅ 37.971ms (SLO: <43.450ms 📉 -12.6%) vs baseline: +1.7%

Memory: ✅ 42.991MB (SLO: <47.000MB -8.5%) vs baseline: +4.9%

✅ start-finish

Time: ✅ 81.820ms (SLO: <88.000ms -7.0%) vs baseline: +0.3%

Memory: ✅ 34.485MB (SLO: <46.500MB 📉 -25.8%) vs baseline: +4.9%

✅ start-finish-telemetry

Time: ✅ 83.449ms (SLO: <89.000ms -6.2%) vs baseline: +0.3%

Memory: ✅ 34.465MB (SLO: <46.500MB 📉 -25.9%) vs baseline: +5.0%

✅ update-name

Time: ✅ 39.342ms (SLO: <45.150ms 📉 -12.9%) vs baseline: +1.1%

Memory: ✅ 43.389MB (SLO: <47.000MB -7.7%) vs baseline: +5.0%

✅ packagespackageforrootmodulemapping - 4/4

✅ cache_off

Time: ✅ 342.810ms (SLO: <354.300ms -3.2%) vs baseline: -0.8%

Memory: ✅ 38.589MB (SLO: <40.000MB -3.5%) vs baseline: +4.4%

✅ cache_on

Time: ✅ 0.384µs (SLO: <10.000µs 📉 -96.2%) vs baseline: +0.3%

Memory: ✅ 36.378MB (SLO: <39.000MB -6.7%) vs baseline: +4.6%

✅ recursivecomputation - 8/8

✅ deep

Time: ✅ 308.846ms (SLO: <320.950ms -3.8%) vs baseline: +0.1%

Memory: ✅ 32.735MB (SLO: <34.500MB -5.1%) vs baseline: +4.4%

✅ deep-profiled

Time: ✅ 328.223ms (SLO: <359.150ms -8.6%) vs baseline: ~same

Memory: ✅ 37.317MB (SLO: <39.000MB -4.3%) vs baseline: +0.7%

✅ medium

Time: ✅ 7.010ms (SLO: <7.400ms -5.3%) vs baseline: +0.1%

Memory: ✅ 32.008MB (SLO: <34.000MB -5.9%) vs baseline: +5.0%

✅ shallow

Time: ✅ 0.944ms (SLO: <1.050ms 📉 -10.1%) vs baseline: +0.4%

Memory: ✅ 31.968MB (SLO: <34.000MB -6.0%) vs baseline: +5.1%

✅ span - 26/26

✅ add-event

Time: ✅ 20.022ms (SLO: <22.500ms 📉 -11.0%) vs baseline: -0.3%

Memory: ✅ 48.404MB (SLO: <53.000MB -8.7%) vs baseline: +4.6%

✅ add-metrics

Time: ✅ 89.982ms (SLO: <93.500ms -3.8%) vs baseline: -0.5%

Memory: ✅ 735.945MB (SLO: <961.000MB 📉 -23.4%) vs baseline: +5.0%

✅ add-tags

Time: ✅ 147.148ms (SLO: <155.000ms -5.1%) vs baseline: ~same

Memory: ✅ 736.145MB (SLO: <962.500MB 📉 -23.5%) vs baseline: +4.9%

✅ get-context

Time: ✅ 18.408ms (SLO: <20.500ms 📉 -10.2%) vs baseline: +0.4%

Memory: ✅ 47.308MB (SLO: <53.000MB 📉 -10.7%) vs baseline: +4.7%

✅ is-recording

Time: ✅ 18.570ms (SLO: <20.500ms -9.4%) vs baseline: +0.1%

Memory: ✅ 47.392MB (SLO: <53.000MB 📉 -10.6%) vs baseline: +5.0%

✅ record-exception

Time: ✅ 37.433ms (SLO: <40.000ms -6.4%) vs baseline: ~same

Memory: ✅ 41.743MB (SLO: <53.000MB 📉 -21.2%) vs baseline: +5.1%

✅ set-status

Time: ✅ 20.312ms (SLO: <22.000ms -7.7%) vs baseline: +1.5%

Memory: ✅ 47.270MB (SLO: <53.000MB 📉 -10.8%) vs baseline: +4.8%

✅ start

Time: ✅ 18.007ms (SLO: <20.500ms 📉 -12.2%) vs baseline: -0.4%

Memory: ✅ 47.288MB (SLO: <53.000MB 📉 -10.8%) vs baseline: +4.8%

✅ start-finish

Time: ✅ 50.657ms (SLO: <52.500ms -3.5%) vs baseline: -0.4%

Memory: ✅ 31.968MB (SLO: <34.000MB -6.0%) vs baseline: +4.9%

✅ start-finish-telemetry

Time: ✅ 52.191ms (SLO: <54.500ms -4.2%) vs baseline: +0.2%

Memory: ✅ 31.968MB (SLO: <34.000MB -6.0%) vs baseline: +4.8%

✅ start-finish-traceid128

Time: ✅ 54.194ms (SLO: <57.000ms -4.9%) vs baseline: +0.2%

Memory: ✅ 31.949MB (SLO: <34.000MB -6.0%) vs baseline: +4.6%

✅ start-traceid128

Time: ✅ 18.637ms (SLO: <22.500ms 📉 -17.2%) vs baseline: +0.8%

Memory: ✅ 47.271MB (SLO: <53.000MB 📉 -10.8%) vs baseline: +4.8%

✅ update-name

Time: ✅ 18.827ms (SLO: <22.000ms 📉 -14.4%) vs baseline: +0.5%

Memory: ✅ 47.952MB (SLO: <53.000MB -9.5%) vs baseline: +4.8%

✅ telemetryaddmetric - 30/30

✅ 1-count-metric-1-times

Time: ✅ 2.960µs (SLO: <20.000µs 📉 -85.2%) vs baseline: +0.7%

Memory: ✅ 31.929MB (SLO: <34.000MB -6.1%) vs baseline: +4.8%

✅ 1-count-metrics-100-times

Time: ✅ 203.006µs (SLO: <220.000µs -7.7%) vs baseline: +2.3%

Memory: ✅ 31.949MB (SLO: <34.000MB -6.0%) vs baseline: +4.9%

✅ 1-distribution-metric-1-times

Time: ✅ 3.307µs (SLO: <20.000µs 📉 -83.5%) vs baseline: +1.8%

Memory: ✅ 32.027MB (SLO: <34.000MB -5.8%) vs baseline: +5.0%

✅ 1-distribution-metrics-100-times

Time: ✅ 213.110µs (SLO: <220.000µs -3.1%) vs baseline: +0.5%

Memory: ✅ 31.968MB (SLO: <34.000MB -6.0%) vs baseline: +4.9%

✅ 1-gauge-metric-1-times

Time: ✅ 2.208µs (SLO: <20.000µs 📉 -89.0%) vs baseline: +2.4%

Memory: ✅ 31.929MB (SLO: <34.000MB -6.1%) vs baseline: +4.9%

✅ 1-gauge-metrics-100-times

Time: ✅ 136.210µs (SLO: <150.000µs -9.2%) vs baseline: +0.5%

Memory: ✅ 31.909MB (SLO: <34.000MB -6.1%) vs baseline: +4.3%

✅ 1-rate-metric-1-times

Time: ✅ 3.082µs (SLO: <20.000µs 📉 -84.6%) vs baseline: ~same

Memory: ✅ 31.929MB (SLO: <34.000MB -6.1%) vs baseline: +4.7%

✅ 1-rate-metrics-100-times

Time: ✅ 214.110µs (SLO: <250.000µs 📉 -14.4%) vs baseline: +0.4%

Memory: ✅ 31.988MB (SLO: <34.000MB -5.9%) vs baseline: +4.9%

✅ 100-count-metrics-100-times

Time: ✅ 20.634ms (SLO: <22.000ms -6.2%) vs baseline: +3.9%

Memory: ✅ 31.929MB (SLO: <34.000MB -6.1%) vs baseline: +4.8%

✅ 100-distribution-metrics-100-times

Time: ✅ 2.232ms (SLO: <2.300ms -3.0%) vs baseline: -0.2%

Memory: ✅ 31.949MB (SLO: <34.000MB -6.0%) vs baseline: +4.7%

✅ 100-gauge-metrics-100-times

Time: ✅ 1.404ms (SLO: <1.550ms -9.4%) vs baseline: +0.6%

Memory: ✅ 32.047MB (SLO: <34.000MB -5.7%) vs baseline: +5.2%

✅ 100-rate-metrics-100-times

Time: ✅ 2.218ms (SLO: <2.550ms 📉 -13.0%) vs baseline: +2.4%

Memory: ✅ 31.968MB (SLO: <34.000MB -6.0%) vs baseline: +4.9%

✅ flush-1-metric

Time: ✅ 4.475µs (SLO: <20.000µs 📉 -77.6%) vs baseline: +0.6%

Memory: ✅ 32.008MB (SLO: <34.000MB -5.9%) vs baseline: +5.1%

✅ flush-100-metrics

Time: ✅ 175.044µs (SLO: <250.000µs 📉 -30.0%) vs baseline: -0.2%

Memory: ✅ 31.949MB (SLO: <34.000MB -6.0%) vs baseline: +4.9%

✅ flush-1000-metrics

Time: ✅ 2.123ms (SLO: <2.500ms 📉 -15.1%) vs baseline: -0.3%

Memory: ✅ 32.716MB (SLO: <34.500MB -5.2%) vs baseline: +4.8%

✅ tracer - 6/6

✅ large

Time: ✅ 29.031ms (SLO: <32.950ms 📉 -11.9%) vs baseline: +0.5%

Memory: ✅ 32.794MB (SLO: <34.500MB -4.9%) vs baseline: +5.0%

✅ medium

Time: ✅ 2.898ms (SLO: <3.200ms -9.4%) vs baseline: +0.4%

Memory: ✅ 31.615MB (SLO: <34.000MB -7.0%) vs baseline: +5.1%

✅ small

Time: ✅ 325.805µs (SLO: <370.000µs 📉 -11.9%) vs baseline: +0.4%

Memory: ✅ 31.634MB (SLO: <34.000MB -7.0%) vs baseline: +5.0%

ℹ️ Scenarios Missing SLO Configuration (8 scenarios)

The following scenarios exist in candidate data but have no SLO thresholds configured:

coreapiscenario-core_dispatch_listeners
coreapiscenario-core_dispatch_no_listeners
coreapiscenario-core_dispatch_with_results_listeners
coreapiscenario-core_dispatch_with_results_no_listeners
djangosimple-baseline
errortrackingdjangosimple-baseline
errortrackingflasksqli-baseline
flasksimple-baseline

KowalskiThomas force-pushed the kowalski/perf-profiling-intern-strings-into-libdatadog branch from f806eda to ff3a20e Compare October 28, 2025 11:51

KowalskiThomas force-pushed the kowalski/perf-profiling-intern-strings-into-libdatadog branch from ff3a20e to f598d1c Compare October 28, 2025 13:21

KowalskiThomas changed the title ~~libdatadog: update~~ perf(profiling): try to intern strings/functions Oct 28, 2025

morrisonlevi reviewed Oct 28, 2025

View reviewed changes

ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp Outdated Show resolved Hide resolved

KowalskiThomas force-pushed the kowalski/perf-profiling-intern-strings-into-libdatadog branch 6 times, most recently from 2864f97 to 091022c Compare October 30, 2025 08:51

KowalskiThomas force-pushed the kowalski/perf-profiling-intern-strings-into-libdatadog branch 13 times, most recently from f925a11 to 94e2fa2 Compare October 31, 2025 16:25

KowalskiThomas added the changelog/no-changelog A changelog entry is not required for this PR. label Oct 31, 2025

KowalskiThomas added 2 commits November 6, 2025 15:30

chore(libdatadog): update

0e4e887

perf(profiling): intern strings into libdatadog

ef96a13

KowalskiThomas force-pushed the kowalski/perf-profiling-intern-strings-into-libdatadog branch from 94e2fa2 to ef96a13 Compare November 6, 2025 14:30

perf(profiling): try to intern strings/functions #15062

Are you sure you want to change the base?

perf(profiling): try to intern strings/functions #15062

Uh oh!

Conversation

KowalskiThomas commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Risks

Additional Notes

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bootstrap import analysis

Summary

Import time breakdown

Uh oh!

Uh oh!

KowalskiThomas commented Oct 30, 2025

Uh oh!

pr-commenter bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance SLOs

✅ ospathbasename_aspect

✅ ospathbasename_noaspect

✅ ospathjoin_aspect

✅ ospathjoin_noaspect

✅ ospathnormcase_aspect

✅ ospathnormcase_noaspect

✅ ospathsplit_aspect

✅ ospathsplit_noaspect

✅ ospathsplitdrive_aspect

✅ ospathsplitdrive_noaspect

✅ ospathsplitext_aspect

✅ ospathsplitext_noaspect

✅ appsec

✅ exception-replay-enabled

✅ iast

✅ profiler

✅ resource-renaming

✅ span-code-origin

✅ tracer

✅ tracer-and-profiler

✅ tracer-dont-create-db-spans

✅ tracer-minimal

✅ tracer-native

✅ tracer-no-caches

✅ tracer-no-databases

✅ tracer-no-middleware

✅ tracer-no-templates

✅ errortracking-enabled-all

✅ errortracking-enabled-user

✅ tracer-enabled

✅ errortracking-enabled-all

✅ errortracking-enabled-user

✅ tracer-enabled

✅ appsec-get

✅ appsec-post

✅ appsec-telemetry

✅ debugger

✅ iast-get

✅ profiler

✅ resource-renaming

✅ tracer

✅ tracer-native

⚠️ context_with_data_listeners

✅ context_with_data_no_listeners

✅ get_item_exists

✅ get_item_missing

✅ set_item

✅ ids_only

✅ with_all

✅ with_dd_origin

✅ with_priority_and_origin

✅ with_sampling_priority

✅ with_tags

KowalskiThomas commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading

pr-commenter bot commented Oct 30, 2025 •

edited

Loading